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Abstract 

We devise methods for finding approximations of the generalized inverse of 
the graph Laplacian matrix, which arises in many graph-theoretic applications. 
Finding this matrix in its entirety involves solving a matrix inversion problem, 
which is resource demanding in terms of consumed time and memory and hence 
impractical whenever the graph is relatively large. Our approximations use only 
few eigenpairs of the Laplacian matrix and are parametric with respect to this 
number, so that the user can compromise between effectiveness and efficiency of 
the approximated solution. We apply the devised approximations to the prob- 
lem of computing current-flow betweenness centrality on a graph. However, 
given the generality of the Laplacian matrix, many other applications can be 
sought. We experimentally demonstrate that the approximations are effective 
already with a constant number of eigenpairs. These few eigenpairs can be 
stored with a linear amount of memory in the number of nodes of the graph 
and, in the realistic case of sparse networks, they can be efficiently computed 
using one of the many methods for retrieving few eigenpairs of sparse matrices 
that abound in the literature. 

Key words: Spectral Theory; Graph Laplacian; Network Science; Current 
flow; Betweenness Centrality 



1. Introduction 



The graph Laplacian is an important matrix that can tell us much about 
graph structure. It places on the diagonal the degrees of the graph nodes and 
elsewhere information about the distribution of edges among nodes in the graph. 
The graph Laplacian matrix, as well as its Moore-Penrose generalized inverse 
( Ben -Israel and Gr evilld . l2003h . are useful tools in network science that turn up 
in many different places, including random walks on networks, resistor networks, 
resistance distance among nodes, node central i ty measures, gra ph partitioning, 



and network connectivity ( Ghosh et al. . 2008 ; Newman . 2010l) . In particular, 



among measures of centrality of graph nodes, betweenness quantifies the extent 
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to which a node lies between other nodes. Nodes with high betweenness are 
crucial actors of the network since they control over information (or over what- 
ever else flows on the network) passing between others. Moreover, removal from 
the network o f these brokers might seriously disrupt comm unications between 
other vertices ( Newmanl . 120051: iBrandes and Fleischer , 20051) . 



The computation of the generalized inverse of the Laplacian matrix is de- 
manding in terms of consumed time and space and thus it is not feasible on rel- 
atively large networks. On the other hand, there are today large databases from 
which real networks can be cons tructed, including technologi c al, information 
socia l , and bio l ogical networks ( Brandes and Erlebachi 12005 ; Newman et al 



2006t iNewmanl . |2010l) . These networks are voluminous and grow in time as 



more data are acquired. We have, therefore, the problem of running computa- 
tionally heavy algorithms over large networks. The solution investigated in the 
present work is the use of approximation methods: algorithms that compute a 
solution near to the exact one and, as a compromise, that run using much less 
resources than the exact algorithm. 

We propose a couple of approximation methods to compute the generalized 
inverse of the Laplacian matrix of a graph. Both methods are based on the 
computation of few eige npairs (eigenvalues and the corresponding eigenvectors) 
of the Laplacian matrix ( Golub and Meurant . 201dh . where the number of com- 



puted eigenpairs is a parameter of the algorithm. The first method, called cutoff 
approximation, uses the computed eigenpairs in a suitable way for the approx- 
imation of the actual entries of the generalized inverse matrix. The second 
method, named stretch approximation, takes advantage of both the computed 
eigenpairs as well as of an estimation of the excluded ones. Both approximation 
methods can be applied to estimate current-flow betweenness centrality scores 
for the nodes of a graph. We experimentally show, using both random and 
scale-free network models, that the proposed approximation are both effective 
and efficient compared to the exact methods. In particular the stretch method 
allows to estimate, using a feasible amount of time and memory, a ranking of 
current-flow betweenness scores that strongly correlates with the exact ranking. 

The layout of the paper is as follows. Section [5] introduces the notions of 
Laplacian matrix and its Moore-Penrose generalized inverse and recalls some 
basic properties of these matrices. In Section [3] we define cutoff and stretch 
approximations. Moreover, we theoretically show that stretch approximation is 
more effective than cutoff approximation. We review the methods for inverting a 
matrix and for finding few eigenpairs of a matrix, which are crucial operations in 
our contribution, in Section 2] Current-flow betweenness centrality is illustrated 
in Section [5] We formulate the definition in terms of the generalized inverse of 
the Laplacian matrix, which allows us to use cutoff and stretch approximations 
to estimate betweenness scores. A broad experimental analysis is proposed 
in Section [6] in order to investigate effectiveness and efficiency of the devised 
approximation methods. Section [7] concludes the paper. 

2. The graph Laplacian and its generalized inverse 

Let Q = (V, E, w) be an undirected weighted graph with V the set of nodes, 
E the set of edges, and w a vector such that w% > is the positive weight of edge 
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i, for i = 1, . . . , \E\. We denote by n the number of nodes and m the number of 
edges of the graph. The weighted Laplacian of Q is the symmetric matrix 



G = D-A 



where A is the weighted adjacency matrix of the graph and D is the diagonal 
matrix of the generalized degrees (the sum of the weights of the incident arcs) 
of the nodes. 

In order to obtain more insight on the properties of the graph Laplacian it 
is useful to express the matrix in another form. Let B 6 K™ xm b e the incidence 
matrix of the graph such that, if edge I connects two arbitrarily ordered nodes i 
and j then Bi t i = 1, Bjj = — 1, while B^.i — for k ^ i, j. Given a vector v, the 
square diagonal matrix whose diagonal entries are the elements of v is denoted 
with Diag(u). It holds that G = B Diag(w)£? T . Thus G besides symmetric 
is positive semidefinite, so that it has real and nonnegative eigenvalues that is 
useful to order < Ai < A2 < . . . < A n . If e denotes a vector of ones, then 
De = Ae so that Ge = 0. It follows that Ai = is the smallest eigenvalue of 
G. We will assume throughout the paper t hat Q is connected . In this case all 



other eigenvalues of G are strictly positive (jGhosh et all [2008) 



= Xi < A 2 < ... < A„. 

Since Ai = 0, the determinant of G is null and hence G cannot be inverted. 
As a substitute for the inverse of G we use the Moor e-Penrose generalized in- 



verse of G, that we simply call generalized inverse of G (Ben-Isr ael and Greville 



2003f ). As customary, we denote this kind of generalized inverse with G + . It is 



convenient to define G + starting from the spectral decomposition of G. Actu- 
ally, since G is symmetric it admits the spectral decomposition 

G = VAV T 



where A = Diag(0, A2, . . . , A„) and the columns of V are the eigenvectors of G. 
Notice that V is an orthogonal matrix, that is VV T = I = V T V. 

By using the spectral decomposition of G, its generalized inverse can be 
defined as follows 

1 1 "l 

G+ = VDiag(0, r , . . . , -)V T = £ V (:,j)V(:,j) T , (1) 

where V(:,j) (respectively, V(j, :)) denotes the j-th column (respectively, row) 
of matrix V. 

Observe that G + inherits from G the property of being symmetric and pos- 
itive semidefinite. Moreover, G + shares the same nullspace of G, as is true 
in general for the Moore-Penrose generalized inverse of a symmetric matrix. 
Thus, since Ge = 0, it turns out that G + e = 0. By setting J = ee T , it 
follows that GJ — JG = G + J = JG + = O, where O is a matrix of all ze- 
ros. By using the eigendecompositions of G and G + it is easy to show that 
(G + l/nJ)(G+ + 1/raJ) - I. It follows that 

G+ = (G + ee T /n)- 1 - ee T /n, (2) 
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a formula that can be found in Ghosh et al. and is used implicitly in 

Brandes and Fleischerl ((2005). Another useful consequence of the above equali- 
ties is 

GG+ = G + G = I -1/nJ. 

The generalized inverse of the graph Laplacian G is useful to solve a linear 
system of the form Gv — b for some known vector b, which arises in many 
applications. The range of a matrix G is the linear space of vectors b for which 
the system Gv — b has a solution. Since G is symmetric its range is the space 
orthogonal to its nullspace. The nullspace of G is one-dimensional and spanned 
by the vector e with all components equal to unity. Hence, the range of G is 
made up by the vectors x such that e T x — 0, or, equivalently, that sum up to 
zero. It follow that the linear system Gv — b has solutions if b sums up to zero. 
By the linearity, the difference of two solution belongs to the nullspace of G, 
and this implies that if we are able to find an arbitrary solution v* , then all the 
other solutions are of the form v* + ae, a 6 K. 

As well known, v* = G + b is the minimum Euclidean norm solution of the 
system Gv = b, i.e., it is the element having minimum Eu clidean norm in the 
afhnc space of the solutions ([Ben-Israel and Gr evilld. l2003h . For completeness, 
we notice that, no matter if b belongs to the range of G or not, v* = G + b is the 
minimum Euclidean norm solution of the problem min x \\Gx — &|| 2 - 



3. Approximations of the generalized inverse 

In this section we propose two approximations of the the generalized inverse 
G + of the graph Laplacian matrix G. For k = 2, . . . , n, we define the fc-th cutoff 
approximation of G + as: 

T^ = i2^V(;,j)V(:jf. (3) 
3=2 3 

In all computations, we never materialize matrix T^ k \ but we represent it 
with the k — 1 eigenpairs that define it. This representation of can be 
stored using 0(kn) space, that is 0(n) if k is constant. Moreover, computing 
an entry of T^ k ' using its eigenpair representation costs 0(fc), that is 0(1) if k 
is constant. 

As k increases, the matrices T^ k ' are more and more accurate approximations 
of G+. Actually G+ = T( n ' and, for k < n, 

n , 

G+=T W + £ -V(:,j)V(:,jf. (4) 

j=k+l 3 

It holds that ||G + - T^h < ||G+ -MII2 fo r every Mel" x " having rank 
less or equal to k— 1 (|Golub and Van Loanl . ll996l ). Moreover, for k = 2, ... , n— 1, 
the relative 2-norm error of the fc-th cutoff approximation is: 

\\G+ - TWJI2 = 1/Afc+i = A 2 
||G+||2 I/A2 Ak+i" 
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The second approximation of G + exploits the following observation. If many 
of the excluded eigenvalues Xj, for j larger than k, are close to each other, we 
might approximate them with a suitable value a. We define the A;-th stretch 
approximation of G + as: 

n , 

S W =r W+ £ ^V(:,j)V(:,jf. (5) 
j=k+l 

It is worth observing that the use of does not involve any significant 
additional cost with respect to the use of T^ k \ Indeed, since Y^j=i 
,j)T = W T = I, then 



S {k) = T^-^V(:,j)V(:,jr + J2-V(-,j)V(:,j) T 
o=l 3=1 

k 

= -I -Vt, l)V(:, If + £(1 ~ bv(:J)V(:,jf. 

3=2 3 

Notice that the normalization of the eigenvector of G associated with the 
eigenvalue Ai = yields V(:, 1) = e/ ' \fn 1 where e is a vector with all components 
equal to unity. It follows that, knowing the value tr, the fc-th stretch approxi- 
mation S( k > can be represented using k — 1 eigenpairs, hence the space needed 
to store the representation of S 1 ^ and the time needed to compute its entries 
do not increase with respect to the use of T^ k \ 

On the other hand, the use of instead of T<» allows to improve the 
bound on the approximation error. Actually, since 

G+-S^=G+-T^- y -V(:,j)V(:,jf, 

3 = k+l 

from Equation U we obtain 



IIG+ - S^\\ 2 = || E (f - -)V(:,j)V(:,j) T \\ 2 = . max |1 - -j 

* — ' A., (7 j=k+l,...,n A," a 

j=k+l 3 3 

Assuming, as reasonable, that Afc+i < a < A„, we have that 

1 1111 1 

X n Xj o Xj Afc + i Aj 

and hence 

,11, 1 1 
max | 1 < — = 7 

j=k+l,...,n Aj a Afc + i A„ 



so that 



l|G+-ff( fc >|| 2 A 2 _ ||G+-TW|| 2 

l|G+|| 2 " 27 A fc+1 ||G+|| a ' 
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Therefore, the relative 2-norm error of the stretch approximation is 
strictly less than the relative 2-norm error of the cutoff approximation , as 
soon as we chose a within Afc+i and A n . Moreover, the closer Afc+i and A n , the 
better the stretch approximation. 

The optimal choice for er, that is, the value that minimizes the 2-norm rela- 
tive error, is the harmonic mean of Xk+i and A„: 

1 1/1 1 



a 2 \Afe+i A n , 
With this choice we reduce of one half the bound of the approximation error: 

max 1 < — . 

j=k+l,...,n Aj (J 2 

We have used this choice of a in all our experiments. Notice that the com- 
putation of a implies computing two additional eigenvalues, namely Afc+i and 
A„, but not the corresponding eigenvectors. To avoid this additional cost, we 
might reasonably assume that A„ is big so that its reciprocal is small, and that 
Afc+i is close to Afc, so that the optimal value of a is approximately 2A&. 



4. Methods for matrix inversion and for finding few eigenpairs 

Finding the generalized inverse of the graph Laplacian matrix involves solv- 
ing a matrix inversion problem (Equation [2]) . Inverting a matrix is, however, 
computational demanding in terms of used time and memory. Given a matrix 
A, the columns of A^ 1 can be computed by solving the linear systems Ax = 
for i — 1, . . . , n, where is the vector whose i-th entry is equal to one and 
the other entries are equal to zero. If a direct method is used, then A is fac- 
torized and the factorization is used to solve the systems. This typically costs 
0(n 3 ) floating point operations and 0(n 2 ) memory locations (the i nverse of a 
matr ix is almost invariably dense even if the input matrix is sparse) dAho" et al 



1974). In particular, the complexity of matrix product and matrix inversion 
are the same, and the best known lower bo und for ma trix product, obtained for 
bounded arithmetic circuits, is Q(n 2 logn) ( Rad . 2003 ). If an iterative method is 



used then, in the case where A has a conditioning independent from the dimen- 
sion, or a good preconditioner can be found, the number of iterations becomes 
independent from the dimension. Since every iteration costs 0(m), then the 
cost is O(mn) floating point operations and 0(n 2 ) memory locations to store 
the inverse. If the matrix is sparse, the number of operations is quadratic. Oth- 
erwise the number of operations has to be multiplied by an additional factor 
dependent from the conditioning of A. For an intr oduction to i terative methods 
to solve linear systems and to preconditioning see ISaad 



Instead of computing the entire generalized inverse of the Laplacian ma- 
trix, our approximation methods compute and store only few eigenpairs of the 
Laplacian matrix. If a matrix A is big and sparse, then the computation of 
few eigenpairs of A can be made by means of iterative methods whose basic 
building block is the product of A by a vector, which has linear complexity 
if A is sparse. One of the simple st among these methods is orthogonal itera- 
tion ( Golub and Van LoanL 19961 ). a generalization of the power method. The 
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method, while simple, can be quite slow since the number of iterations depends 
on the distance between the sought eigenvalues of A and experimental evidence 
shows tha t the eigenvalues n earest to zero are clustered, in particular for sparse 



networks ( Zhan et al. . 2010f l 



On the other hand, one of the most widely used alg orithms is the Lanczo s 



method with implicit restart, implemented by ARPACK ([Lehoucq et all 119981 ). 
This is the method we have used in our experiments. For the computation of 
few smallest eigenpairs of a matrix A the method works in the so called shift 
and invert mode. In other words the Lanczos method is applied to (A — c/) -1 
being a a suitable shift. To do this, the matrix A — al is factorized before 
the iteration begins and the factorization is used to solve the sequence of linear 
systems that arises during the calculation. This accelerates the method but the 
factors are surely much less sparse than the matrix itself. This, combined with 
the clustering of the eigenvalues near zero, leads to a nonlinear scaling, which 
was observed also in our experiments. 

Alternative a pproaches are Jacobi-Dayidson and Deflation Accelerated Con- 
jugate Gradient ([Bergamaschi and Puttil . 120021 ). that seem to be highly compet- 
itive with the Lanczos method. In particular in the Jacobi-Davidson method it 
is still needed to solve inner linear systems, but the factorization is avoided and 
substituted by the use of preconditioned iterative Krylov spaces based methods. 
Deflation Accelerated Conjugate Gradient sequentially computes the eigenpairs 
by minimizing the Rayleigh quotient q(z) — z T Az/z T z over the subspace or- 
thogonal to the eigenvectors previously computed. Finally, we mention the mul- 
tilevel algorithm implemented in the HSL_MC73 routine of the HSL mathemat- 
ical software library, that however only computes the second smallest eigenpair 
( Hu and Sco"ttl . l2003l ). 



5. Current-flow betweenness centrality 



A large v olume of research on networks has been devoted to the concep t 
of centrality (|Sabidussil . Il966t iFreemanl . Il979t iBorgattil 120051: iNewmanl . |2010|) . 

This research addresses the fundamental question: Which are the most impor- 
tant or central vertices in a network? There are four measures of centrality 
that are widely used in network analysis: degree centrality, eigenvector central- 
ity, closeness, and betweenness. Here, we focus on betweenness centrality. 

Betweenness measures the extent to which a node lies on paths between 
other nodes. Nodes with high betweenness might have considerable influence 
within a network by virtue of their control over information (or over whatever 
else flows on the network) passing between others. They are also the ones 
whose removal from the network will most disrupt communications between 
other vertices because they lie on the largest number of paths between other 
nodes. 

Typically, only geodesic paths are considered in the definition, obtaining a 
measure that is called shortest-path betweenness. Here, we study current-flow 
betweenness, which includ es contributions of all paths, although l onger paths 
give a lesser contribution ( Newmanl . 2005; B randes and Fleischerl . I2005T ). For 
a given node, current-flow betweenness measures the current flow that passes 
through the vertex when a unit of current is injected in a source node and 
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removed from a target node, averaged over all source-target pairs. Equivalently, 
it is equal to the net number of times that a random walk on the graph passes 
through the node on its journey, averaged over a large number of trials of the 
random walk. 

We next give the precise definition of current-flow betweenness centrality in 
terms of resistor networks. Consider a network in which the edges are resistors 
and the nodes are junctions between resistors. Each edge is assigned with a 
positive weight indicating the conductance of the edge. The resistance of an 
edge is the inverse of its conductance. Outlets are particular nodes where current 
enters and leaves the network. A vector u called supply defines them: a node i 
such that Ui 7^ is an outlet; in particular, if Ui > then node i is a source and 
current enters the network through it, while if Ui < then node i is a target and 
current leaves the network through it. Since there should be as much current 
entering the network as leaving it, we have that 52 • Ui — 0. We consider the 
case where a unit of current enters the network at a single source s and leaves 
it at a single target t. That is, uf^ — for i ^ s,t, u{ s '^ — 1, and = — 1. 
We are interested in how current flows through the network, for an arbitrary 
choice of source and target outlets. 

(s t) 

Let v) ' be the potential of node i, measured relative to any convenient 
reference potential, for source s and target t outlets. Kirchhoff 's law of current 
conservation implies that the node potentials satisfy the following equation for 
every node i: 

EAi^-fV^' (6) 
3 

where A is the weighted adjacency matrix of the network. The current flow 
through edge is the quantity Ai t j(v[ s — Vj S ^), that is, the difference 

of potentials between the involved nodes multiplied by the conductance of the 
edge: a positive value indicates that the current flows in a direction (say from i 
to j), and negative value means that the current flows in the opposite direction. 
Hence, Krichhoff 's law states that the current flowing in or out of any node is 
zero, with the exception of the source and target nodes. 

In matrix form, equation [B] reads: 

(D - A)v^ s '^ = Gv^ = u (s ' 4) (7) 

where D is a diagonal matrix such that the i-th element of the diagonal is 
equal to J2j Ai.j, that is, it is the (generalized) degree of node i. Recall that 
G = D — A is the graph Laplacian matrix. As noticed in Section [2] if G + is the 
generalized inverse of the Laplacian matrix G, then the potential vector: 

„(-,*) = g+ u (m) (8 ) 

This means that the potential of node i with respect to source s and target 
t outlets is given by vj. s,t ' — Gf s — Gf r Therefore, the generalized inverse 
matrix G + contains information to compute all node potentials for any pair of 
source-target nodes. 




Figure 1: A resistor network with all resistances equal to unity. Each node is identified with 
a letter and is labelled with the value of its potential when a unit current is injected in node 
A and removed from node H. Each edge is labelled with the absolute current flowing on it. 



An example of resistor network with node potential solution is provided in 
Figure [TJ Notice that Kirchhoff's law is satisfied for each node. For instance, 
the current entering in node B is 0.47 (from node A) which equals the current 
leaving node B, which is again 0.47 (0.13 to E, 0.27 to F, and 0.07 to C). 
Moreover, the current leaving the source node A is 1, and the current entering 
the target node H is also 1. Notice that there is no current on the edge from 
C to D, since both nodes have the same potential. Any other potential vector 
obtained from the given solution by adding a constant is also a solution, since 
the potential differences remain the same, and hence Kirchhoff's law is satisfied. 
The given potential vector is, however, the solution with minimum Euclidean 
norm. 

We have now all the ingredients to define current-flow betweenness centrality. 
As observed above, given a source s and a target t, the absolute current flow 
through edge is the quantity A;j {v^'^ — Vj \. By Kirchhoff's law the 

current that enters a node is equal to the current that leaves the node. Hence, 

(s t) 

the current flow F£ ' through a node i different from the source s and a target 
t is half of the absolute flow on the edges incident in i: 

3 

(s t) (s t) 

Moreover, the current flows F s ' ' and F£ ' through both s and t are set 
to 1, if end-points of a path are considered part of the path (this is our choice 
in the rest of this paper), or to otherwise. Figure [5] gives an example. Notice 
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Figure 2: A resistor network with all resistances equal to unity (this is the same network of 
Figure [TJ . Each node is now labelled with the value of flow through it when a unit current is 
injected in node A and removed from node H. Each edge is labelled with the absolute current 
flowing on it. 



that the flow from A to H through node G is 1 (all paths from A to H pass 
eventually through G), the flow through F is 0.4 (a proper subset of the paths 
from A to H go through F and these paths are generally longer than for G), 
and the flow through E is 0.13 (a proper subset of the paths from A to H go 
through E and these paths are generally longer than for F) 

Finally, the current-flow betweenness centrality bi of node i is the flow 
through i averaged over all source-target pairs (s,t): 

1 (l/2)n(n-l) { ' 

Since the potential uj s = Gf s — Gf t , with G + the generalized inverse of 
the graph Laplacian, Equation [S] can be expressed in terms of elements of G + 
as follows: 

= |E J -A J |G+ +G+ -G+ -G+| (11) 

Hence, if we replace in Equation II II matrix G + with its fc-th cutoff approx- 
imation (or its /c-th stretch approximation S^), we get an approximated 
value of current- flow betweenness centrality for node i. 

The computational complexity of Equation [TU] is as follows. We denote with 
ki the number of neighbors of node i, that is, the number of edges incident in 
i. Assuming we have matrix G + , computing Equation QT] for given i, s, and t 
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costs O(ki), if i has at least one neighbor, or 0(1) otherwise. Hence Equation 
[10] for a specific i costs 0(n 2 ■ min(fe,, 1)). Since J^- min(fei, 1) = 0(n + m), 
Equation [10] for all nodes costs 0(n 2 (n + m)), that is, 0(n 3 ) if the graph is 
sparseQ Furthermore, the complexity of computing the equation using either 
cutoff or stretched approximations increases of a factor of k, since an entry 
of T( fe ) or of 5*^ can be obtained in O(k). If k is contant with respect to n, 
then there is no asymptotic increase of complexity. Clearly, if n is large, the 
cost of Equation [TU] is prohibitive. 

A possible solution for the bottleneck of Equation [TU] is the adoption of a 
probabilistic approach by computing Equ ation [TT1 only for a sample of source- 



target pairs (jBrandes and Fleischerl 120051 ) . If we choose at random an source 



nodes, with < a < 1, and, for each source, we choose at random an target 
nodes, then the sample of source-target pairs has size a 2 n 2 . For instance, if 
a = 0.1, then a 2 = 0.01 and hence the cost of computing Equation ITUl declines 
of two orders of magnitude. 

An alternative solution to avoid the bottleneck of Equation [TU] is to use 
cutoff approximation with k — 2 (only the second smallest eigenpair is used). 
In this case Equation [T0l significantly simplifies. Indeed, if k = 2, then T^J = 
{l/X2)Vi^Vy2- It follows that flow can be approximated by: 

= 5 fe|V., 3 -Vt, a |E i ^|K l2 -^,a| 

Notice that the sum in the formula for the flow is now independent on the 
source-target pair. Hence the approximated betweenness of i is: 



1 



h = rtf^T) ^ 2T |K - 2 ~ ^IE a mI^, 2 - v j>2 \ = cJ2A itj \v it2 - v j>2 \ 

where C is a constant. This means that the approximated betweenness 6j is 
proportional to the quantity • A it j \ — Vj$ \ . Notice that 6, is a local version 
of bi that depends only on the neighborhood of i. Ignoring the multiplicative 
constant C, the approximated betweenness bi can be computed in 0(n + m) for 
all nodes, hence in linear time with respect to the size of the network. 

We conclude this section by computing exact and approximated betweenness 
centrality on a real network. The instance is a social network of dolphins ( Tur- 
siops truncatus) belonging to a community that lives in the fjord of Doubtful 
Sound in New Zealand. The unusual conditions of this fjord, with relatively 
cool water and a layer of fresh water on the surface, have limited the departure 
of dolphins and the arrival of new individuals in the group, facilitating a strong 
social relationship within the dolphin community. The network is an undirected 



1 This cost can be improved t o 0(mn log n), hence 0(n 2 log n) for sparse networks, as shown 
in lBrandes and Fleischerl (j20Q5T >. 
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Exact betweenness 



Approximated betweenness 




Figure 3: A dolphin social network. The size of the nodes is proportional to the current-flow 
betweenness. The exact solution is shown on the left and the approximated one is shown on 
the right. Black nodes are the top-10 leaders in both networks (9 of which are shared by the 
two networks). The approximation uses the stretch method with 3 eigenpairs. 



unweighted graph containing 62 nodes (dolphins) and 159 non-directional con- 
nections between pairs of dolphins. Two dolphins are joined by an edge if, dur- 
ing the observation period lasted from 1994 to 2001, they were spotted together 
more often than expected by chance. This network has been extensively stud- 
ied by David Lusseau and co-authors, see, for instance, (jLusseau and Newmanl . 
2004) . 



The ranking obtained with the cutoff approximation method using only three 
eigenpairs of the graph Laplacian correlates at 0.92 with the exact ranking, and 
the mean change of rank between the two compilations is 5.4. Moreover, the 
stretch approximation method performs even better. Using the same number 
of eigenpairs (three), the approximated and exact rankings correlate at 0.99 
and the mean change of rank between the two compilations is just 2 positions. 
Figure [3] depicts the dolphin social network where the size of the node is pro- 
portional to its exact current-flow betweenness (graph on the left) and to its 
approximated current-flow betweenness (graph on the right). Moreover, Table 
[T] shows the top-10 of both compilations and Figure |4] gives the scatter plot of 
the two rankings. Nine dolphins over 10 are present in both top-10 rankings: 
the missing dolphins (DN63 and Grin) both rank 11th in the other ranking. 
Six dolphins have the same rank in both compilations (notably, the top-4 is the 
same). In fact, the stretch approximation method is already effective using just 
one eigenpair (correlation at 0.98 and mean change at 2.9). These outcomes sug- 
gest that the proposed approximation methods for betweenness, in particular 
the stretch version, are effective. 
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Dolphin Exact Dolphin Approx 



Beescratch 


0.254 


Beescratch 


0.290 


SN100 


0.244 


SN100 


0.266 


Jet 


0.209 


Jet 


0.222 


SN9 


0.189 


SN9 


0.222 


Web 


0.183 


SN4 


0.220 


DN63 


0.181 


Trigger 


0.217 


Upbang 


0.179 


Upbang 


0.215 


SN4 


0.177 


Web 


0.211 


Kringel 


0.176 


Kringel 


0.206 


Trigger 


0.165 


Grin 


0.204 



Table 1: The top-10 rankings according to exact betweenness (column Exact) and approx- 
imated betweenness (column Approx). The approximation uses the stretch method with 3 
eigenpairs. 



6. Experimental evaluation 

This section is devoted to the experimental evaluation of effectiveness and ef- 
ficiency of the proposed approximations for the generalized inverse of the Lapla- 
cian of a graph, and in particular for the current-flow betweenness centrality 
scores of nodes. 



6.1. Experimental setting 

In our experiments, we took advantage of the following two well-known net- 
work models. The first mod el is the random model, also known as Erdos-Reny i 
mode l (ER model for short) ( Solomonoff and Rapoportill95ll : Erdos and RenyiL 



1960l ) . According to this model, a network is generated by laying down a num- 



ber of nodes and adding edges between them with independent probability for 
each node pair. This model generates a small world network with a binomial 
node degree distribution, which is peaked at a characteristic scale value. We 
will denote a graph drawn according to the ER model by ER(n, q), where n is 
the number of nodes and p = q/n is the edge probability. Such a graph has 
roughly m — q(n — l)/2 edges. An ER graph is not necessarily connected, but 
it contains a giant connected component containing most of the nodes as soon 
as q > 1. We extracted the giant component from the generated graphs and 
used this component in our experiments. 

However, many real networks are scale-free, that is, the degree distribu- 
tion has a long-tail that roughly follows a power law (|Newmanl . [2010h . A net- 



work model that captures the power law distribution of node degrees is known 
as cumulative advantage (|Simonl . 119551 Ide Solla Pricel Il976l ). which was later 
rediscovered and further investigated under the n ame preferential attachment 



or Barabasi- Albert m odel (BA model for short) (jBarabasi and Albert). [l99 



iBarabasi et al. . 19991 ). Such a model, as described bv barabasi and Alber 



(jl999t ). has the following two main ingredients: 

Startup: the graph has a single isolated node; 
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Figure 4: A scatter plot of exact and approximated betweenness rankings: the rank of a dol- 
phin in the exact ranking is plotted against the rank of the same dolphin in the approximated 
ranking. The approximation uses the stretch method with 3 eigenpairs. 



• Growth: additional nodes are added to the network one at a time; 

• Preferential attachment: each node connects to the existing nodes with a 
fixed number r of links. The probability that it will choose a given node 
is proportional to the number of links the chosen node already has. 

The resulting network is a small-world graph with a power law degree dis- 
tribution: most of the nodes (the trivial many) will have low degree and a small 
but significant share of nodes (the vital few or hubs) will have an extraordi- 
nary high degree. We will denote a graph drawn according to the BA model as 
BA(n, r), where n is the number of nodes and r is the number of edges attached 
to each node added during the preferential attachment step. Such a graph has 
m = r(n — 1) edges. Notice that the resulting graph is connected but it is not 
necessarily simple: multiple edges might exist between the same pair of nodes. 
We simplified the graph in our experiments, removing the multiple edges, if any. 

In this section we give results for unweighted networks only. We also exper- 
imented with weighted graphs with edge weights drawn from a random uniform 
distribution, but we noticed no particular discrepancy with respect to the un- 
weighted case. 

The experiments have been performed within the R computing environment, 
taking advantage of the igraph package for the generation o f networks. We 



used of R interface to LAPACK (Linear Algebra Package) ([Anderson et al 



mi for matrix inversion and ei genpairs genera tion, as well as the R interface 



to ARPACK (Arnoldi Package) ( Lehoucq et al. . 19981) . when few eigenpairs of 
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Figure 5: Plot of sorted eigenvalues (left plot) and of sorted degrees (right plot) on ER(n = 
1000, q = 10) 



BA model BA model 




Figure 6: Plot of sorted eigenvalues (left plot, up to index 980) and of of sorted node degrees 
(right plot, up to index 980) on BA(n = 1000, r = 5) 



large sparse matrices are necessary. Moreover, we exploited the C programming 
language for a faster implementation of the computation of betweenness scores 
using Equation I10[ calling the C compiled code from the R environment (R is 
rather slow at iterative algorithms that require loops iterated many times). All 
experiments are run on a MacBook Pro equipped with a 2.3 GHz Intel Core i5 
processor, 4 GB 1333 MHz DDR3 memory, running Mac OS X Lion 10.7.2. 

6.2. Distribution of eigenvalues and of node degrees 

We begin the experimental investigation with an exploratory analysis of the 
distribution of eigenvalues of the Laplacian, comparing with the distribution 
of node degrees on both ER and BA networks. This study is interesting to 
understand the behavior of the proposed approximations. 

The exploratory experiment uses graphs ER(rt = 1000, q = 10) and BA(n = 
1000, r = 5). Notice that the used graphs have the same number of nodes 
and approximately the same number of edges, hence a similar edge density. 
Nevertheless, they differ in the way the edges are placed among nodes. 

Figures [5] and |6] explore the distribution of eigenvalues of the Laplacian as 
well as the distribution of node degrees on ER and BA graphs, respectively. We 
notice that the eigenvalues of the Laplacian are well approximated by the degrees 
of the nodes, an interesting phenomenon already investigated in IZhan et al. 
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Figure 7: Scatter plot of eigenvalues versus node degrees on ER(n = 1000, q = 10) (left plot) 
and on BA(n = 1000, r = 5) (right plot) 



ER 

BA 




Figure 8: Plot of eigenvalues of Laplacian (left plot, up to index 980) and of generalized inverse 
Laplacian (right plot, up to index 980) on ER(n = 1000, q = 10) and BA(n = 1000, r = 5). 



Jot the ER model, eigenvalues and degrees have a correlation of 0.989; 
the Euclidean distance between the two sorted vectors, relative to the norm-2 
of the eigenvalue vector, is 0.13. Using a BA model, eigenvalues and degrees are 
correlated at 0.983 and the relative Euclidean distance between the two sorted 
vectors is 0.02. Figure [7] shows a scatter plot of Laplacian eigenvalues versus 
node degrees. 

The plots of the sorted eigenvalues of the Laplacian on ER and BA models 
are visibly different (Figure [8j left plot). In the initial part (up to index 800), 
ER eigenvalues grow faster than BA eigenvalues; however, in the tail of the 
plot, BA eigenvalues rapidly catch up, overtaking ER eigenvalues around index 
960. From here to the end of the sequence, BA eigenvalues literally explode 
(notice that, in order to appreciate the behavior of smaller eigenvalues, the 
figure shows the eigenvalue curve up to index 980). The eigenvalues of the 
generalized inverse of the Laplacian, which are the inverse of the Laplacian 
eigenvalues, are shown in Figure [5J right plot. Both curves decrease rapidly in 
the beginning, but the ER curve has a more significant drop; after this initial 
fall, both curves decreases gently, although the BA line declines faster. This 
might be a hint of the effectiveness of the stretch approximation method: the 
tail of the eigenvalue curve has a small slope, hence the eigenvalues of the tail 
can be well approximated by a single middle value. 
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Figure 9: Relative 2-norm error of cutoff and stretch approximations on BA(n. = 1000, r = 5) 
(left plot) and ER(n = 1000,? = 10) (right plot). 




Figure 10: Effectiveness of cutoff approximation for betweenness increasing the number of 
cigcnpairs from 1 to 25 on ER(n = 100, q) for different values of q (plot on the left) and on 
BA(n = 100, r) for different values of r (plot on the right). 



6. 3. Effectiveness of approximations 

In this section we investigate the effectiveness of approximations of the gen- 
eralized inverse of the Laplacian and, in particular, of the current-flow between- 
ness rankings. We first study the relative 2-norm error of the fc-th cutoff and 
stretch approximations, defined as the ratio between \\G + — T( fc )||2 and ||G + ||2 
(cutoff approximation error), and the ratio between \\G + — 5^||2 and ||G + ||2 
(stretch approximation error). Figure |H] shows the approximation error as the 
number of eigenpairs k grows from 1 to n. On both network models, the stretch 
approximation is significantly more effective than the cutoff one: on average, the 
stretch approximation error is 30% and 50% of the cutoff approximation error 
on ER and BA graphs, respectively. In particular, the stretch approximation is 
more effective on ER graphs, as predicted by the spectral behavior highlighted 
in Figure [8] 

We next test the effectiveness of the approximations applied to current-flow 
betweenness. The quality of the approximations is establishing by computing 
the correlation coefficient among the approximated and exact node rankings. 
We used the Pearson product-moment correlations, where the input data is 
logarithmically transformed when it is not normally distributed. This coefficient 
is largely used in sciences to measure the similarity of two rankings; it runs from 



17 



_i_ _ J- - dr z "a * ™ 



O 



/ 



/ 



-e- cut off (ER) 

,° -A- stretched (ER) 

— I— cut off (BA) 

D ' -*- stretched (BA) 

~1 1 1 — I 1 — 

5 10 15 20 25 



Number of eigenpairs 



Figure 11: Comparison of cutoff and stretch approximations for betweenness increasing the 
number of eigenpairs from 1 to 25 on ER(n = 100, 9 = 4) and BA(n = 100, r = 2). 



— 1 to 1 where values close to —1 indicate negative correlation (one ranking is 
the reverse of the other), values close to correspond to null correlation (the 
two rankings are independent), and values close to 1 denote positive correlation 
(the two rankings are similar). A good approximation, in our assessment, is a 
method that approaches the exact ranking with correlation of at least 0.9. 

The effectiveness experiments are run on ER and BA graphs with 100 nodes 
and an increasing number of edges: we used q = 2, 4, 8 for the the ER model 
and r = 1,2, 10 for the BA model. The corresponding ER and BA graphs have 
roughly the same density. Notice that a BA graph with r — 1 is a tree, hence 
an acyclic graph, while r > 1 generates graphs with loops. We always generated 
a sample of 100 such graphs and took the average of the observed correlation 
coefficients. 

Figure [10] shows the effectiveness of the cutoff approximation for betweenness 
on ER and B A graphs with increasing density. The quality of the approximation 
increases with the number of eigenpairs that are used in the approximation 
and decreases, for both network models, as the graphs become more dense. 
A cumulative comparison between cutoff and stretch approximations as well 
as among ER and BA network models is given in Figures 111! The stretch 
approximation performs neatly better than the cutoff one. Regardless of the 
graph model and of the graph density, the effectiveness of the stretch method 
is well above the quality threshold of 0.9 already with a single eigenpair. 

Finally, we tested the probabilistic approximation of betweenness that uses 
only a sample of node pairs for solving Equation [101 In Figures [12] we compare 
the probabilistic approach on both the exact version of betweenness, that uses 
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Figure 12: Comparison of the probabilistic approach on both exact and stretch betweenness 
(one eigenpair) increasing the size of the node pair sample from 2.5% to 25% on ER(n = 
100, g = 4) and BA(n = 100, r = 2). 



the full generalized Laplacian inverse matrix for the computation of betweenness 
scores, and the stretch version of betweenness, that uses the stretch approxi- 
mation method with one eigenpair only. The plot shows the effectiveness of 
the combined approximation algorithm that uses the stretch method (with a 
single eigenpair) for the approximation of the generalized Laplacian inverse and 
the probabilistic approach for the computation of betweenness scores: less than 
10% of the node pair space is sufficient to get an approximated ranking that 
correlates at 0.9 with the exact ranking. 

6.4- Efficiency of approximations 

In this section we show the outcomes of experiments about the efficiency of 
the proposed approximations for the generalized inverse of the Laplacian matrix. 
We compare elapsed time of the computation of the full generalized inverse of the 
Laplacian with that of the computation of the approximation that uses only one 
eigenpair (the second smallest eigenvalue and the corresponding eigenvector). 

Figure [T3] shows the elapsed time necessary for the computation of the gen- 
eralized Laplacian inverse for BA and ER networks. The time growth is 0(n 3 ) 
and the memory requirement is 0(n 2 ) on both models, where n is the number 
of nodes of the graph. These costs make the computation not feasible on large 
networks. For instance, for a large network with 10 6 nodes, the necessary time 
would be 642 days with a storage of about one terabyte. On the other hand, 
Figure Q31 gives the elapsed time necessary for the computation of the approxi- 
mation that uses only one eigenpair for BA and ER networks. For both models, 
the time growth is 0(n 15 ) and the memory requirement is 0(n). Notice that 
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Figure 13: Efficiency of the computation of the generalized inverse of the Laplacian matrix of 
BA graphs (r = 2, plot on the left) and ER graphs (q = 4, plot on the right) with increasing 
number of nodes. The data fits a cubic curve ax 3 + b with b = 8.906-1CT 2 and a = 5.545-1CT 11 
for the BA model (R-Square = 0.9992), and a cubic curve ax 3 + b with b = 3.038 • 10 _1 and 
a = 8.120 ■ lO" 11 for the ER model (R-Square = 0.9999). 
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Figure 14: Efficiency of the computation of the second smallest eigenpair of the Laplacian 
matrix of BA graphs (r = 2, plot on the left) and ER graphs (q = 4, plot on the right) with 
increasing number of nodes. The data fits a curve ax 15 + b with b = —1.292 and a = 4.25- 10~ 5 
for the BA model (R-Square = 0.9976), and a curve ax 15 + b with b = 1.426 ■ 10" 1 and 
a = 3.217 ■ 10~ 6 for the ER model (R-Square = 0.9877). 



the computation on ER graphs is one order of magnitude more efficient com- 
pared to BA graphs. These complexity make it possible to approximate the 
generalized inverse of the Laplacian, and in particular current-flow betweenness 
scores, on relatively large networks: on a standard machine, the computation 
on a random network with 10 6 nodes would last less than one hour and it would 
take about 12 hours (a reasonable time) on a scale- free network with the same 
number of nodes. In both cases the required storage is only one megabyte. 

7. Conclusion 

We have proposed effective and efficient methods for approximating the gen- 
eralized inverse Laplacian matrix of a graph, a matrix that arises in many graph- 
theoretic applications. A notable such application investigated in the present 
contribution is current-flow betweenness centrality, a measure for finding cen- 
tral nodes in a graph that lie between many other nodes. Our approximation 
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methods turn out to be suitable when the network at hand is large, so that 
computing the full generalized inverse is not realistic. 

Many other applications can be sought, for instance, the approximation of 
resistance distance matrix of a graph. Resistance distance is a metric on the 
graph, alterna tive to the classical sho r test- path distance, that was defined, inde- 
pendently, bvlSte phenson and Zelenl (Il989h . following an information-theoretic 
approach, and bv lKlein and Randid ( 19931) . following an electrical-theoretic ap- 
proach. The resistance distance notion has different int eresting interpretat ions 



and many applications, even outside of network science ([Ghosh et aUl2008f) . It 



turns out that resistance distance matrix c an be immediately d efined in terms 
of the generalized inverse Laplacian matrix ( Ghosh et a l.. 2008), allowing us to 
extend our approximation methods to resistance distance matrix as well. 
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